Tailoring data source distributions for fairness-aware data integration
نویسندگان
چکیده
Data scientists often develop data sets for analysis by drawing upon sources of available to them. A major challenge is ensure that the set used has an appropriate representation relevant (demographic) groups: it meets desired distribution requirements. Whether collected through some experiment or obtained from provider, any single source may not meet Therefore, a union multiple required. In this paper, we study how acquire such in most cost effective manner, typical functions observed practice. We present optimal solution binary groups when underlying distributions are known and all have equal costs. For generic case with unequal costs, design approximation algorithm performs well When unknown, exploration-exploitation based strategy reward function captures approximations group each source. Besides theoretical analysis, conduct comprehensive experiments confirm effectiveness our algorithms.
منابع مشابه
Data Concern Aware Querying for the Integration of Data Services
There is an increasing trend for organizations to publish data over the web using data services. The published data is often associated with data concerns like privacy, licensing, pricing, quality of data, etc. This raises several new challenges. For instance, it must be ensured that data consumers utilize the data in the right way and are bound to the rules and regulations defined by the data ...
متن کاملData Source Selection for Information Integration in Big Data Era
In Big data era, information integration often requires abundant data extracted from massive data sources. Due to a large number of data sources, data source selection plays a crucial role in information integration, since it is costly and even impossible to access all data sources. Data Source selection should consider both efficiency and effectiveness issues. For efficiency, the approach shou...
متن کاملLearning Source Description for Data Integration
To build a data-integration system, the application designer must specify a mediated schema and supply the descriptions of data sources. A source description contains a source schema that describes the content of the source, and a mapping between the corresponding elements of the source schema and the mediated schema. Manually constructing these mappings is both labor-intensive and error-prone,...
متن کاملSource Integration in Data Warehousing
Source Integration is one of the core problems in Data Warehousing. Two critical factors for the design and maintenance of applications requiring Source Integration, and in particular Data Warehouse applications, are conceptual modeling of the domain, and reasoning support over the conceptual representation. We present a novel approach to conceptual modeling for Source Integration, which allows...
متن کاملLearning Source Descriptions for Data Integration
To build a data-integration system, the application designer must specify a mediated schema and supply the descriptions of data sources. A source description contains a source schema that describes the content of the source, and a mapping between the corresponding elements of the source schema and the mediated schema. Manually constructing these mappings is both labor-intensive and error-prone,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the VLDB Endowment
سال: 2021
ISSN: ['2150-8097']
DOI: https://doi.org/10.14778/3476249.3476299